Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads
نویسندگان
چکیده
To reduce storage overhead, cloud file systems are transitioning from replication to erasure codes. This process has revealed new dimensions on which to evaluate the performance of different coding schemes: the amount of data used in recovery and when performing degraded reads. We present an algorithm that finds the optimal number of codeword symbols needed for recovery for any XOR-based erasure code and produces recovery schedules that use a minimum amount of data. We differentiate popular erasure codes based on this criterion and demonstrate that the differences improve I/O performance in practice for the large block sizes used in cloud file systems. Several cloud systems [15, 10] have adopted Reed-Solomon (RS) codes, because of their generality and their ability to tolerate larger numbers of failures. We define a new class of rotated Reed-Solomon codes that perform degraded reads more efficiently than all known codes, but otherwise inherit the reliability and performance properties of Reed-Solomon codes.
منابع مشابه
A Tale of Two Erasure Codes in HDFS
Distributed storage systems are increasingly transitioning to the use of erasure codes since they offer higher reliability at significantly lower storage costs than data replication. However, these codes tradeoff recovery performance as they require multiple disk reads and network transfers for reconstructing an unavailable data block. As a result, most existing systems use an erasure code eith...
متن کاملOn the I/O Costs of Some Repair Schemes for Full-Length Reed-Solomon Codes
Network transfer and disk read are the most time consuming operations in the repair process for node failures in erasure-code-based distributed storage systems. Recent developments on Reed-Solomon codes, the most widely used erasure codes in practical storage systems, have shown that efficient repair schemes specifically tailored to these codes can significantly reduce the network bandwidth spe...
متن کاملBelief Propagation Decodable XOR based Erasure Codes For Distributed Storage Systems
LDPC codes and digital fountain techniques have received significant attention from both academics and industry in the past few years. There have also been extensive interests in applying LDPC code techniques to distributed storage systems such as cloud data storage in recent years. This paper carries out the theoretical analysis on the feasibility and performance issues for applying LT codes t...
متن کاملData Insertion and Archiving in Erasure-Coding Based Large-Scale Storage Systems
Given the vast volume of data that needs to be stored reliably, many data-centers and large-scale file systems have started using erasure codes to achieve reliable storage while keeping the storage overhead low. This has invigorated the research on erasure codes tailor made to achieve different desirable storage system properties such as efficient redundancy replenishment mechanisms, resilience...
متن کاملIn Search of I/O-Optimal Recovery from Disk Failures
We address the problem of minimizing the I/O needed to recover from disk failures in erasure-coded storage systems. The principal result is an algorithm that finds the optimal I/O recovery from an arbitrary number of disk failures for any XOR-based erasure code. We also describe a family of codes with high-fault tolerance and low recovery I/O, e.g. one instance tolerates up to 11 failures and r...
متن کامل